Previous ARC studies required participants to manually provide detailed context about each GPS location they visited:
This participant-provided information was then used to create predictive features for lapse probability.
The GPS2 project investigates whether we can achieve similar predictive utility by extracting features from participants’ routine patterns and location metadata, all without requiring any manual contextualization.
This approach would ideally reduce participant burden while also potentially capturing behavioral patterns that participants themselves might not consciously recognize or report.
The ideal workflow for GPS data is as follows:
Group nearby stationary GPS points into discrete location representations
This allows us to create interpretable routine locations from raw coordinate clusters.
Apply geocoding services to automatically retrieve contextual information about each clustered location
business type
neighborhood characteristics
proximity to high-risk venues
etc
Transform location metadata and visit patterns into predictive features
time spent at different venue types
stability of work/housing
exposure to high-risk environments
All without requiring participant input!
Question:
What is the minimum level of location data granularity required to achieve valid research outcomes while maintaining acceptable privacy protection for participants?
┌─────────────────────────────────────────────────────────────────────┐
│ Local Computer │
│ │
│ ┌─────────────────┐ ┌──────────────────────────────────────┐ │
│ │ Research Drive │ │ Docker Environment │ │
│ │ │ │ │ │
│ │ /Volumes/ │ │ ┌─────────────┐ ┌───────────────┐ │ │
│ │ jjcurtin/ │────┼──│ PostGIS │ │ Nominatim │ │ │
│ │ studydata/ │ │ │ Container │ │ Container │ │ │
│ │ risk/ │ │ │ │ │ │ │ │
│ │ │ │ │ Port: 5433 │ │ Port: 8080 │ │ │
│ │ • GPS data │ │ │ <─│──┼> │ │ │
│ │ • Zoning Data │ │ │ Database: │ │ Service: │ │ │
│ │ • OSM Data │ │ │ gps_analysis│ │ Reverse- │ │ │
│ └─────────────────┘ │ │ │ │ geocoding │ │ │
│ │ │ User: │ │ │ │ │
│ │ │ postgres │ └───────────────┘ │ │
│ │ └─────────────┘ │ │
│ └──────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘ ┌───────────────────────────────┐
│ Quarto Notebooks │
│ │
│ 01-setup-infrastructure.qmd │
│ 02-data-import.qmd │
│ 03-gps-processing-clustering │
│ 04-reverse-geocoding.qmd │
│ 05-spatial-zoning-analysis │
│ 06-visualizations.qmd │
└───────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Integration │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ R │◄─┤ SQL ├─►│ PostGIS │ │
│ │ │ │ │ │ │ │
│ │ • dplyr │ │ • Queries │ │ • Spatial database │ │
│ │ • sf │ │ • Joins │ │ • Geographic data │ │
│ │ • leaflet │ │ • Filtering │ │ • Spatial functions │ │
│ │ • analysis │ │ • Inserts │ │ • Index operations │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘